The packages used in this project are: Rio: Chan et al. (2021) Readr: Wickham and Hester (2021) Haven: Wickham and Miller (2021)

Abstract

On April 14th, 1912, on its maiden voyage, the Titanic struck an iceberg. Two hours and 40 minutes later, approximately 62% of the passengers perished. Prior research has attempted to determine the characteristics of those who survived the sinking of the Titanic compared to those who died in order to better which attributes were prioritized for determining a life and death situation during that era. The purpose of this study is to further explore the most popular characteristics–class, gender, and age–using descriptive statistics, data visualization, and predictive models (e.g., logistic regression and conditional inference classification trees). Logistic regression results indicate that all three demographic attributes are significant predictors of survival. However, classification tree results suggest that gender had the largest effect on survival, followed subsequently by class. Interestingly, age was only a significant differentiating attribute between males.

Introduction

Work in brief statement about class system, passengers, and voyage.

dat %>% 
   group_by(class) %>% 
   summarize(count = n()) %>% 
   mutate(percent = (count/sum(count))*100) %>% 
   adorn_totals() %>%
   kable(caption = "Breakdown of Passengers by Class",
       col.names = c("Class", "Count", "Percent"),
       digits = 2,
       booktabs = TRUE) %>%
  kable_classic(full_width = F, html_font = "Cambria")
Breakdown of Passengers by Class
Class Count Percent
1st Class 324 24.64
2nd Class 284 21.60
3rd Class 707 53.76
Total 1315 100.00

The Titanic was a British cruise liner that featured the most advanced technology available in 1912 but collided with an iceberg just before midnight and sank in the 2-degree Celsius North Atlantic Ocean on its maiden voyage and killed over two-thirds of the passengers and crew onboard (Balakumar et al., 2019; Frey et al., 2011; Hall, 1986). There are many documented factors that contributed to so many of the passengers and crew perishing on the Titanic. There were not enough lifeboats on board. The Titanic included twenty lifeboats which was only enough room for 52% of passengers on board at that time (Frey et al., 2011; Hall, 1986; Symanzik et al., 2019) and a portion of lifeboats launched were not full (Frey et al., 2011; Symanzik et al., 2019). Those who did not get a seat in a lifeboat were sure to perish due to the frigid ocean temperature (Hall, 1986; Frey et al., 2011) and the lower probability of being saved as it’s reported that partially full lifeboats that were lowered made no attempt to save people from the water (Hall, 1986). In addition to the media created based on the Titanic disaster, there have also been datasets released which provide information about the passengers and crew on the Titanic including their sex, age, nationality, ticket fare, social class, ticket number, passenger or crewmember status, parent status, sibling status, spouse status, and port of embarkation (Balakumar et al., 2019). This has enabled researchers to determine who was on the Titanic and whether any of these variables were significant predictors of survival or death.

The Titanic took approximately two hours and forty minutes to sink to the bottom of the ocean which is a lengthier amount of time compared to some other maritime disasters such as the Lusitania which took only about 18 minutes to sink to the bottom of the ocean after being struck by a torpedo (Frey et al., 2011). It has been hypothesized that this longer amount of time between the Titanic being struck and sinking completely left room for social patterns to operate rather than more selfish interests as in the Lusitania where passengers may have felt more of a fight-or-flight response to more imminent danger (Frey et al., 2011). For example, evacuating women and children before men was a social norm and code of conduct in 1912 (Farag & Hassan, 2018). It has also been documented that Captain Edward Smith had shouted, “Women and children first” after the Titanic collided with the iceberg (Farag & Hassan, 2018). This length of time may have also given an advantage to the first-class passengers compared to the second- and third-class passengers, and to the second-class passengers compared to the third-class passengers, due to a higher likelihood of these passengers giving commands and the crewmembers listening, as well as their financial means to bargain with crewmembers (Frey et al., 2011). NOTE: Wealth Gap here - Inline text now optional

As shown in the graph below, ehen taking inflation rates into consideration, we see that the average price for a first-class cabin in 1912 was $150.00, which today would be $4,241.74

ggplotly(fare_graph)

Frey et al. (2011) explain that lifeboats were stored closest to the first-class cabins, first-class passengers had more access to information about the disaster, and they were more likely to have a relationship with the officers who gave orders for loading lifeboats, all of which may have given first-class passengers an advantage in survival. Because the Titanic was perceived to be a British ship, it may have been the case that British passengers were favored by crewmembers on the Titanic and had a greater chance of survival compared to passengers who were not British.

Research Questions

  1. What were the characteristics of the passengers of the Titanic who survived or perished?
  2. Were passengers’ class, gender, and age significant predictors of survival?
  3. Which of the three demographic characteristics had the greatest influence on survival?

Methods

Analytic Sample

Since the population of interest for this study is passengers who were aboard the Titanic during its sinking, passengers who disembarked at Cherbourg, Queenstown, and Southampton (n = 35) as well as crew members were excluded from the analyses. Missing data was handled through listwise deletion of two participants who did not have their ages recorded. Thus, the analytic sample consisted of 1,315 passengers. The table below shows a breakdown of the analytic sample by class affiliation and gender.

dat %>% 
   group_by(class) %>% 
   summarize(count = n()) %>% 
   mutate(percent = (count/sum(count))*100) %>% 
   adorn_totals() %>%
   kable(caption = "Breakdown of Passengers by Class",
       col.names = c("Class", "Count", "Percent"),
       digits = 2,
       booktabs = TRUE) %>%
  kable_classic(full_width = F, html_font = "Cambria")
Breakdown of Passengers by Class
Class Count Percent
1st Class 324 24.64
2nd Class 284 21.60
3rd Class 707 53.76
Total 1315 100.00

Participant ages ranged from 0-74 years (M = 31.42, SD = 13.92). The table below shown the distribution of ages by each class. The average age in first-class was substantially older than both second and third-class. This may suggest that the trip served a different purpose for that group of passengers, such as recreation and experience versus business travels and immigration (Hall, 1986).

dat %>% 
   group_by(class) %>% 
   summarize(avg_age = mean(age), std_age = sd(age), min_age = min(age), 
             max_age = max(age)) %>%
   kable(caption = "Average Age by Class",
       col.names = c("Class", "Average Age", "SD Age", "Min. Age", "Max. Age"),
       digits = 2,
       booktabs = TRUE) %>%
  kable_classic(full_width = F, html_font = "Cambria")
Average Age by Class
Class Average Age SD Age Min. Age Max. Age
1st Class 39.14 13.55 0 71
2nd Class 30.01 13.90 0 71
3rd Class 25.12 11.71 0 74

The table below shows the list of nationalities reported by the Titanic’s passengers. The majority of the passengers where English (22.43%), American (18.40%), and Irish (9.28%). The majority of first-class passengers were American (60.19%), whereas the majority of second-class passengers were English (51.06%). Third class passengers were the most diverse class, with the most popular nationalities being English (15.84%), Irish (14.85%), Swedish (12.73%), and Syrian/Lebanese (11.74%). The difference in nationalities were likely due to the large number of individuals in third-class who were immigrating to American (Hall, 1986).

dat %>% 
   filter(!is.na(nationality2)) %>% 
   group_by(nationality2) %>% 
   summarize(count = n()) %>% 
   mutate(percent = (count/sum(count))*100) %>% 
   arrange(desc(percent)) %>%
   kable(caption = "Breakdown of Passenger Nationalities",
       col.names = c("Nationality", "Count", "Percent"),
       digits = 2,
       booktabs = TRUE) %>%
   kable_styling(fixed_thead = T, full_width = F, html_font = "Cambria", bootstrap_options = c("striped", "hover"))
Breakdown of Passenger Nationalities
Nationality Count Percent
English 295 22.43
American 242 18.40
Irish 122 9.28
Other - Multiple 108 8.21
Swedish 100 7.60
Syrian/Lebanese 85 6.46
Finnish 58 4.41
Canadian 37 2.81
Bulgarian 31 2.36
Croatian 28 2.13
French 26 1.98
Norwegian 26 1.98
Belgian 25 1.90
Scottish 17 1.29
Channel Islander 15 1.14
Swiss 13 0.99
Danish 10 0.76
Italian 9 0.68
German 8 0.61
Spanish 8 0.61
Welsh 8 0.61
Polish 6 0.46
Bosnian 4 0.30
Hong Kongese 4 0.30
South African 4 0.30
Greek 3 0.23
Lithuanian 3 0.23
Uruguayan 3 0.23
Australian 2 0.15
Chinese 2 0.15
Portuguese 2 0.15
Slovenian 2 0.15
Austrian 1 0.08
Dutch 1 0.08
Egyptian 1 0.08
Haitian 1 0.08
Hungarian 1 0.08
Japanese 1 0.08
Latvian 1 0.08
Mexican 1 0.08
Turkish 1 0.08

Measures

Dependent Variable

The primary outcome of interest was survival status, which was recorded as a dichotomous factor variable (lost or survived).

Independent Variables

Independent variables included class (which serves as a proxy for socioeconomic status), gender, and age. Class was recorded as a three-level factor variable (first-class, second-class, and third-class), whereas gender was recorded as a dichotomous factor variable (female or male). Age (in years) was recorded as a continuous variable.

Analysis

Data analysis was performed using RStudio: Integrated Development Environment for R (RStudio Team, 2021) version 4.1.1. Descriptive statistics were computed to describe the analytic sample as well as compare survival rates across demographic subgroups of interest. Density ridges were graphed in order to visualize survival rate differences for gender and class subgroups across age ranges. Next, a logistic regression model was estimated to examine whether the main effects of gender (reference group = female), class (reference group = first-class), and age were significant predictors of surviving the disaster. To assess how these groups interact to influence survival as well which variable was the most influential, a conditional classification tree was estimated. Conditional classification trees combine recursive partitioning and statistical inference. This type of classification tree uses a splitting criteria based on Bonferroni-corrected statistical significance testing, which minimizes biases often associated with traditional classification trees (Hothorn et al., 2006). Alpha was set at .95 for all multivariate analyses.

Results

Descriptive Statistics of Survival

OVERALL…

dat %>% 
   group_by(survived) %>% 
   summarize(count = n()) %>% 
   mutate(percent = (count/sum(count))*100) %>% 
   adorn_totals() %>%
   kable(caption = "Overall Survival Outcomes",
       col.names = c("Outcomes", "Count", "Percent"),
       digits = 2,
       booktabs = TRUE) %>%
 kable_classic(full_width = F, html_font = "Cambria")
Overall Survival Outcomes
Outcomes Count Percent
Lost 815 61.98
Saved 500 38.02
Total 1315 100.00

When examining the descriptive statistics broken down by class and gender, there are substantial disparities in survival. as shown in the table below, approximately 62% of first-class passengers survived, compared to 41.55% of second-class passengers and 74.47% of third-class passengers.

dat %>% 
   group_by(class, survived) %>% 
   summarize(count = n()) %>% 
   mutate(percent = (count/sum(count))*100) %>% 
   arrange(class, survived) %>%
   kable(caption = "Survival Rate by Class",
       col.names = c("Class", "Survived", "Count", "Percent"),
       digits = 2,
       booktabs = TRUE) %>%
 kable_classic(full_width = F, html_font = "Cambria")
Survival Rate by Class
Class Survived Count Percent
1st Class Lost 123 37.96
1st Class Saved 201 62.04
2nd Class Lost 166 58.45
2nd Class Saved 118 41.55
3rd Class Lost 526 74.40
3rd Class Saved 181 25.60

As shown in the table below, 72.75% of female passengers survived compared to 18.96% of male passengers.

dat %>% 
 group_by(gender, survived) %>% 
 summarize(count = n()) %>% 
 mutate(percent = (count/sum(count))*100) %>% 
 arrange(gender, survived) %>%
 kable(caption = "Survival Rate by Gender",
       col.names = c("Gender", "Survived", "Count", "Percent"),
       digits = 2,
       booktabs = TRUE) %>%
  kable_classic(full_width = F, html_font = "Cambria")
Survival Rate by Gender
Gender Survived Count Percent
Female Lost 127 27.25
Female Saved 339 72.75
Male Lost 688 81.04
Male Saved 161 18.96

The table below shows Examining survival rates broken down by both class and gender. Only five female first-class female passengers lost their lives while 96.53% survived. 65.56% of first-class male passengers lost their lives while 34.44% survived. Among second-class female passengers, 11.32% perished and 88.68% survived. For second-class male passengers, 86.52% perished and 13.48% survived. 50.93% of third-class female passengers lost their lives while 49.07% survived. 84.79% of third-class male passengers lost their lives while 15.21% survived. These differences in rates highlight how class and gender may interact to predict survival.

dat %>% 
   group_by(class, gender, survived) %>% 
   summarize(count = n()) %>% 
   mutate(percent = (count/sum(count))*100) %>% 
   arrange(class, gender) %>%
   kable(caption = "Survival Rate by Class and Gender",
       col.names = c("Class", "Gender", "Survived", "Count", "Percent"),
       digits = 2,
       booktabs = TRUE) %>%
 kable_classic(full_width = F, html_font = "Cambria")
Survival Rate by Class and Gender
Class Gender Survived Count Percent
1st Class Female Lost 5 3.47
1st Class Female Saved 139 96.53
1st Class Male Lost 118 65.56
1st Class Male Saved 62 34.44
2nd Class Female Lost 12 11.32
2nd Class Female Saved 94 88.68
2nd Class Male Lost 154 86.52
2nd Class Male Saved 24 13.48
3rd Class Female Lost 110 50.93
3rd Class Female Saved 106 49.07
3rd Class Male Lost 416 84.73
3rd Class Male Saved 75 15.27

Furthermore, age was also an important factor that contributed to survival. As shown in the figure below, age…Something here

surv_ageclass_hist

Logistic Regression Model

Results of the main effects logistic regression model predicting survival are shown in the table below. When controlling for the effects of gender and class, age was a significant predictor of survival (OR 0.97; 95% CI 0.95, 0.98; p<0.001). With each additional year in age, passengers’ odds of survival decreased by three percent. When controlling for the effects of age and gender, class affiliation was a significant predictor of survival (OR NA; 95% CI NA, NA; NA). Compared to first-class passengers, second-class passengers’ and third-class passengers’ odds of surviving the disaster were 73% lower and 90% lower, respectively. Gender was also a significant predictor of survival (OR NA; 95% CI NA, NA; NA), even when controlling for class and age. Male passengers faced 92% lower odds of survival compared to female passengers. Taken together, these results confirm that–even when controlling for one another–class, age, and gender significantly affected survival rates.

tbl_m1
Characteristic OR1 95% CI1 p-value
age 0.97 0.95, 0.98 <0.001
gender
Female
Male 0.08 0.06, 0.11 <0.001
class
1st Class
2nd Class 0.27 0.18, 0.40 <0.001
3rd Class 0.10 0.07, 0.15 <0.001

1 OR = Odds Ratio, CI = Confidence Interval

Classification Tree

The figure below shows the results of the conditional classification tree used to model survival. The tree’s terminal nodes identified the following eight subgroups: 1. First class females 2. Second class females 3. Third class females 4. First class males, 54 years of age or younger 5. First class males, older than 54 years of age 6. Second class males, nine years of age or younger 7. Third class males, nine years of age or younger 8. Second and third-class males, older than nine years of age The terminal nodes’ barplots indicate the breakdown of survival for each subgroup (black = survival, gray = loss of life). Each gender was stratified by class, suggesting that class was an important predictor of survival for both males and females. However, class had much smaller effect in women (p = .044) than men (p <.001). Female subgroups were not split by age, whereas all male subgroups were split by age following class, which indicates that age had a larger effect among males than females. Furthermore, the age split for first-class males (54 years of age) is substantially larger than the age split among second and third-class males (nine years of age), which aligns with the wider age distribution of first-class males previously observed in the density ridge graphs. Interestingly, second and third-class males over the age of nine were not split by class. When examining the model as a whole, the base node was gender (p < .001), suggesting it was the greatest predictor of survival entered into the model. Thus, based on these order of the tree splits, one can hypothesize that gender was the largest predictor of survival, followed by class and age, respectively.

plot(ctree, main = "Predicting Survival From Gender, Class, and Age")

Discussion

NEEDS FINISHED

References

Chan, Chung-hong, Geoffrey CH Chan, Thomas J. Leeper, and Jason Becker. 2021. Rio: A Swiss-Army Knife for Data File i/o.
Wickham, Hadley, and Jim Hester. 2021. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.
Wickham, Hadley, and Evan Miller. 2021. Haven: Import and Export ’SPSS’, ’Stata’ and ’SAS’ Files. https://CRAN.R-project.org/package=haven.